Introducing a Performance Model for Bandwidth-Limited Loop Kernels
نویسندگان
چکیده
We present a performance model for bandwidth limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.
منابع مشابه
Multi-core architectures: Complexities of performance prediction and the impact of cache topology
The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to in-cache situations and modern multi-core architectures yields unsatisfactory results. This paper analyzes the influence of cache hierarchy design on performance predictions for bandwidth-limited loop kernels on current mainstream processors. We present a diagno...
متن کاملRobust Controller Design Based-on Aerodynamic Load Simulator Identification Driven by PMSM for Hardware-in-the-Loop Simulations
Aerodynamic load simulators generate the required time varying load to test the actuator’s performance in the laboratory. Electric Load Simulator (ELS) as one of variety of the dynamic load simulators should follows the rotation of the Under Test Actuator (UTA) and applies the desired torque to UTA’s rotor at the same time. In such a situation, a very large torque is imposed to the ELS from the...
متن کاملA-New-Closed-form-Mathematical-Approach-to-Achieve Minimum Phase Noise in Frequency Synthesizers
The aim of this paper is to minimize output phase noise for the pure signal synthesis in the frequency synthesizers. For this purpose, first, an exact mathematical model of phase locked loop (PLL) based frequency synthesizer is described and analyzed. Then, an exact closed-form formula in terms of synthesizer bandwidth and total output phase noise is extracted. Based on this formula, the phase ...
متن کاملDesign of a Single-Layer Circuit Analog Absorber Using Double-Circular-Loop Array via the Equivalent Circuit Model
A broadband Circuit Analogue (CA) absorber using double-circular-loop array is investigated in this paper. A simple equivalent circuit model is presented to accurately analyze this CA absorber. The circuit simulation of the proposed model agrees well with full-wave simulations. Optimization based the equivalent circuit model, is applied to design a single-layer circuit analogue absorber using d...
متن کاملWave Equation Based Stencil Optimizations on Multi-core CPU
As the engine for seismic imaging algorithms, stencil kernels modeling wave propagation are both computeand memoryintensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate...
متن کامل